Establishment of a nomogram to predict the overall survival of patients with collecting duct renal cell carcinoma

Background Collecting duct carcinoma (CDC) is a rare histological type of renal cell carcinoma that lacks a prognostic prediction model. In this study, we developed a nomogram to predict the prognosis of CDC patients. Methods Data for patients (n = 247) diagnosed with CDC from 2004 to 2015 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database, and the patients were randomized into training (n = 165) and validation (n = 82) cohorts. Survival outcomes were evaluated by the Kaplan–Meier method. Significant variables determined by univariate and multivariate Cox regression analyses were used to construct the nomogram. C-indexes and calibration plots were applied to evaluate the performance of the nomogram. Results CDC patients had a median overall survival (OS) of 18.0 months (95% confidence interval: 13.7–22.3); 1-year, 3-year, and 5-year OS rates were 58.7%, 34.2%, and 29.4%, respectively. Independent prognostic factors, including age at diagnosis, tumor size, tumor grade, T stage, N stage, M stage, and surgery information, were identified by multivariate analysis. The nomogram was constructed based on significant factors in the training cohort. The C-indexes were 0.769 (training cohort) and 0.767 (validation cohort). The calibration curves for survival rates showed that the predicted and observed values were consistent. Conclusions This study constructed a nomogram to predict prognosis in patients with CDC. The nomogram performed well in predicting the 1-year, 3-year, and 5-year OS, which can help doctors actively monitor and follow up patients. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-024-01140-8.


Introduction
Collecting duct carcinoma (CDC), which accounts for less than 3% of renal cell carcinoma (RCC) cases, is a rare pathological type characterized by an aggressive phenotype and poor prognosis [1][2][3].CDC derives from the renal collecting duct epithelium.CDC has a high propensity for distant metastasis, with poor pathological differentiation and symptoms [4,5].Many efforts, including advances in surgical techniques, radiotherapy, chemotherapy, targeted therapy, immunotherapy, and improvements in supportive treatment, have been made to increase the overall survival (OS) of patients with CDC [4,[6][7][8].
Despite modest advancements in therapy, patients with CDC have a poor prognosis.According to published studies, approximately 50% of CDC patients die from the cancer within 2 years of their initial diagnosis [9,10].Because of the rarity of the disease and the lack of available data, there are scant prognostic models for predicting the survival outcome of CDC tumors.Indeed, only May M and colleagues have developed a risk model, which was based on 95 patients with CDC, to predict prognosis using histopathologic and clinical parameters [9].However, a large-population study is still needed to predict the individualized survival of patients with CDC.The Surveillance, Epidemiology, and End Results (SEER) database represents 28% of the US population, and it includes stages of cancer, treatment modalities, and survival data.As the database includes large, multi-institutional cases, it may be an essential source for rare disease research, providing strong statistical power and possibly contributing to a better understanding of CDC.
In this retrospective study, we analyzed and identified the clinicopathological features and independent prognostic factors of patients with CDC based on data retrieved from the SEER database.We further developed and validated a nomogram to aid in prognostic evaluation for individual patients with CDC.

Data collection
The data used in this study were obtained from the online, publicly available SEER database.A data agreement form was signed, and we have submitted it to the SEER administration.As this research did not include human participants or experimental animals, it was deemed exempt by the Domain-Specific Review Board.
We extracted the data for this study from SEER using SEER*Stat software (Version 8.3.6).First, we identified 245,190 patients with renal tumors (International Classification of Disease for Oncology, C64) between 2004 and 2015 in the database.Then, 439 CDC patients were identified by histologic confirmation, excluding those (n = 192) with unknown variables (race, n = 5; marital status, n = 13; primary tumor size, n = 90; tumor location, n = 2; tumor grade, n = 69; T stage, n = 4; N stage, n = 1; surgery information, n = 7; and survival time, n = 1).Surgery information was identified as partial and radical nephrectomy excluding local tumor destruction, such as laser ablation (n = 2), cryosurgery (n = 2), or photodynamic therapy (n = 1).Ultimately, 247 CDC patients were eligible for analysis.The process used to generate our analytic cohort is depicted in Fig. 1.

Data preparation and definition
The following 13 variables were obtained from the SEER program: race, sex, age at diagnosis, marital status, tumor location, primary tumor size, tumor grade, tumor-node metastasis (TNM) stage information, surgical treatment, OS in months, and vital status."Others" of the race category included American Indian/Alaska Native or Asian/Pacific Islander.Marital status was assessed as single or married, with "single" including never married, separated/divorced, and widowed, according to SEER marital status categories.X-tile software (Version 3.6.1)was used to transform "age at diagnosis" into a categorical variable (Supplementary Fig. 1).Primary tumor size was evaluated based on the 7th AJCC staging system by categorizing as ≤ 4 cm, 4-7 cm (including 7 cm), 7-10 cm (including 10 cm) and ˃ 10 cm [11].

Nomogram construction and validation
The enrolled CDC patients were randomized to two groups: a training cohort (n = 165) and a validation cohort (n = 82).A nomogram was developed using the training cohort.Prognostic factors were identified by univariate and multivariate Cox regression analyses, and the significant variables were used to establish the nomogram for predicting 1-, 3-, and 5-year OS rates.The established nomogram was then internally (in the training cohort) and externally (in the validation cohort) validated.The C-index was employed to evaluate the discrimination of the nomogram [12], whereby good discriminative ability was a C-index between 0.71 and 0.90; a C-index > 0.90 indicated better accuracy.The performance of the nomogram was evaluated with a calibration plot, and a plot falling on a diagonal 45° line was considered to indicate a useful nomogram.

Statistical analysis
The final endpoints of this study were the 1-, 3-, and 5-year OS rates.Relationships between the training and validation cohorts were compared by the chi-square test or Fisher's exact test.Survival outcomes were determined by the Kaplan-Meier method and the log-rank test.The statistical analysis was performed using the R software environment for statistical computing and graphics (version 4.0.0) with the packages rms, ggplot2, survival, and survminer.A p value less than 0.05 was considered significant.
The nomogram was established based on the training cohort (n = 165), and models were constructed by univariate and multivariate regression analyses (Table 2).
Univariate survival analysis revealed age at diagnosis, primary tumor size, surgery, tumor grade, T stage, N stage, and M stage to be significant prognostic factors for OS, and these seven significant risk factors were examined by multivariate analysis.The results showed that age ≥ 71 (hazard ratio [HR]: 1.730, 95% confidence interval CI 1.103-2.715,p = 0.017), tumor diameter 7-10 cm (HR: 2.032, 95% CI 1.042-4.251,p = 0.048) or > 10 cm (HR: 2.228, 95% CI 1.179-4.213,p = 0.014), tumor grade III/IV (HR: 1.796, 95% CI 1.027-3.628,p = 0.016), T3/4 stage (HR: 1.479, 95% CI 1.009-2.487,p = 0.021), N1 3.101, 95% CI 1.572-6.117,p = 0.001) were independent prognostic factors for OS.These significant factors assigned different scores according to their prognostic value were selected to construct the nomogram for predicting 1-, 3-, and 5-year OS rates (Fig. 3).The scales of age, tumor size, tumor grade, T stage, N stage, M stage, and surgery ranged from 0 to 47.5, 0 to 68, 0 to 49, 0 to 34, 0 to 31, 0 to 72.5, and 0 to 100, respectively.Survival probability was then estimated by calculating the total score.To use the nomogram, the specific points of individual patients are located on each variable axis.The sum of these points is located on the Total Points axis, and a vertical line downward to the survival axes determines the probability of OS.

Nomogram validation
We first evaluated the nomogram in the training set, with a C-index for OS of 0.769 (95% CI 0.697-0.841).Calibration curves of 1-, 3-and 5-year OS rates showed that the predicted and observed values of the nomogram were consistent (Fig. 4).Then, the validation cohort was used to assess the nomogram, and the OS prediction C-index was 0.767 (95% CI 0.696-0.838).Furthermore, the calibration curves of the 1-, 3-and 5-year OS rates also revealed that the predicted and observed values were consistent (Fig. 4).As illustrated in Fig. 4, the line segments in the calibration plots are close to the 45° line, which indicates good predictive ability, demonstrating that our nomogram is useful for the prediction of 1-, 3-, and 5-year OS.

Discussion
The tumor-node metastasis (TNM) staging system is the most commonly used method for predicting the survival outcome of RCC.However, patients with different histological types may have considerable survival differences, even at the same stage of RCC.CDC is a rare type of RCC with few available clinicopathological data that is associated with poor survival outcomes [3].Nomograms have been shown to be a promising method to predict the prognosis of patients with some rare tumor types [13][14][15].Thus, we sought to develop and validate a prognostic nomogram.In this study, we identified a large sample of patients with CDC from the SEER database and developed a nomogram to predict OS.The results showed that the nomogram performed well in the internal and external validation cohorts.those of the abovementioned reports.Furthermore, survival analysis showed that the median OS for CDC patients was 18.0 months.The rates of 1-, 3-, and 5-year OS were 58.7%, 34.2%, and 29.4%, respectively.
There is only one large study that aimed to develop a model for the prediction of the survival outcome of CDC.May M and colleagues proposed a prognostic scoring model consisting of clinical and pathological parameters based on 95 CDC patients, including the American Society of Anesthesiologists score, tumor size, Fuhrman grade, lymphovascular invasion status, and metastasis presentation.The final evaluation of the model was accurate, with a C-index value of 0.894 [9].In our study, we used a relatively large population (n = 247) and divided these patients into training and validation cohorts.In the training cohort, 7 variables, namely, age at diagnosis, primary tumor size, surgery, tumor grade, T stage, N stage, and M stage, were associated with prognosis in univariate analysis; multivariate analysis demonstrated these factors to be independent prognostic factors for OS.Therefore, these factors were used to establish the nomogram for predicting 1-, 3-, and 5-year OS rates.Through internal self-verification in the training cohort and external verification in the validation cohort, the C-index values reached 0.769 and 0.767, respectively.The results of verification revealed that our nomogram has good predictive ability.
Research thus far has focused on predicting tumor prognosis by combining molecular markers with clinicopathological factors.The heterogeneity of CDC may also be accompanied by biological differences among patients.Overall, there is a lack of knowledge regarding the biology and molecular architecture of this tumor compared with other RCC subtypes [18].There have been only a few reports about chromosomal aberrations, mutations, and RNA expression changes in CDCs [19][20][21].Although the results of these studies have clarified the aggressiveness and rarity of CDC to some extent, no clear prognostic factors have been identified.Thus, further research is still needed.
Our study has several limitations that need to be acknowledged: Firstly, 192 patients with CDC were excluded because of data loss.Excluding patients with missing values shrink the sample size.These may contribute to selection bias.
Secondly, the SEER database does not contain detailed information about systemic therapies, such as targeted therapy, chemotherapy, and radiotherapy.Experimental information was also not included.These may lead to model imperfection.
Thirdly, the size of the datasets used is not enough, the study was lack of external validating study.

Fig. 1
Fig.1The process of screening 247 patients with collecting duct renal cell carcinoma in the SEER database for analysis

Fig. 2
Fig. 2 Survival analysis of patients with collecting duct renal cell carcinoma in the training and validation sets

Fig. 3 5 Conclusion
Fig. 3 Nomogram for predicting the 1-year, 3-year and 5-year OS of patients with collecting duct renal cell carcinoma